Section 1 describes the data management in Polymetrics and demonstrates the use of pandas in importing and analyzing the data. The dataset is imported using the FileImport module in Polymetrics
In addition to Polymetrics, the following libraries are imported:
import pandas as pd
import numpy as np
import json
import Polymetrics as poly
import FileImport
import polymetrics_config
import traceback
from IPython.display import display
import plotly.graph_objects as go
The data used in the example is taken from a patent - US 2013/0046061 (Hermel-Davidock et. al.). The patent data has a good mix of examples to demonstrate the handling of different polymer object types. The data is compiled into an excel file - 'Example_Dataset.xlsx'.
Polymetrics dataset is a pandas DataFrame with polymer objects arranged row-wise and their attributes stored in columns. A polymer object can be
More details are given below.
The first step is importing the dataset. The function XLSXImport imports the data and conditions some datatypes for easier processing.
df_in = FileImport.XLSXImport("Article/Example_Dataset.xlsx", sheet_name = 'Data')
The dataset is made of mixed datatypes. Although there is no specific requirement on column names, a few columns should be an integral part of the datasets as they are hardcoded in some functions.
The following columns should be present in the dataset.
UID: Unique four-character alphanumeric string to reference every polymeric object in the dataset.
Analytical data, if present, should follow the following guidelines.
To demonstrate how the data is handled in Polymetrics, the df_in is shortened by selecting a few columns with mixed datatypes. The new dataset is called df_short. It is a terse representation of the dataset df_in. The following columns are selected from df_in to create df_short.
df_short = df_in.loc[:,['Identifier', 'UID', 'Project', 'CDC', 'CEF_FileName', 'MFR_180C']].copy()
display(df_short.head(2))
| Identifier | UID | Project | CDC | CEF_FileName | MFR_180C | |
|---|---|---|---|---|---|---|
| 0 | IS1 | 016Y | US20130046061 | 63.8 | Article/IS1_CEF.csv | {'I2': 1.5, 'I10': 11.5} |
| 1 | IS2 | 099E | US20130046061 | 140.9 | Article/IS2_CEF.csv | {'I2': 0.4, 'I10': 7.1} |
The data types are currently used in the following way:
The different datatypes can be accessed as follows.
print('The second element in the CDC column is', df_short.loc[1, 'CDC'])
print('UID/Identifier relationship is as follows')
display(df_short[['Identifier', 'UID']].head(5))
print('Shape of the',df_short.loc[0, 'Identifier'], 'CEF data file is', pd.read_csv(df_short.loc[0,'CEF_FileName']).shape)
print('Shape of the',df_short.loc[1, 'Identifier'], 'CEF data file is', pd.read_csv(df_short.loc[1,'CEF_FileName']).shape)
The second element in the CDC column is 140.9 UID/Identifier relationship is as follows
| Identifier | UID | |
|---|---|---|
| 0 | IS1 | 016Y |
| 1 | IS2 | 099E |
| 2 | IS3 | 0BHE |
| 3 | IS4 | 0BLI |
| 4 | CS1 | 0D2Y |
Shape of the IS1 CEF data file is (83, 2) Shape of the IS2 CEF data file is (67, 2)
Note that the CEF plot data (dWf/dT Vs T) are of unequal sizes. The CEF Data was extracted from the published images using (https://automeris.io/WebPlotDigitizer/).
Several functions in Polymetrics return information parsed as python dictionaries. Dictionaries stored in the dataset can be accessed similarly. Note that the keys of the dictionary objects must be entered in double-quotes like json strings.
print('Dictionary objects in the polymetrics dataset are')
display(df_short[['MFR_180C']].head(5))
print('Individual element can be accessed as follows')
for i in range(0,2):
for key in df_short['MFR_180C'][i]:
print (key, df_short['MFR_180C'][i][key])
Dictionary objects in the polymetrics dataset are
| MFR_180C | |
|---|---|
| 0 | {'I2': 1.5, 'I10': 11.5} |
| 1 | {'I2': 0.4, 'I10': 7.1} |
| 2 | {'I2': 1.6, 'I10': 11.5} |
| 3 | {'I2': 1.0, 'I10': 8.0} |
| 4 | {'I2': 1.6, 'I10': 9.1} |
Individual element can be accessed as follows I2 1.5 I10 11.5 I2 0.4 I10 7.1
Two datasets with dissimilar columns can be combined using Polymetrics function 'dataset_merge'.
# df_short1 was created by selecting a few columns from df_in
df_short1 = df_in.iloc[2:4,[0,1,2,3,4,13,17,18,20]].copy()
# df_short2 was created by removing columns 4, 13 and 17 from rows
df_short2 = df_in.iloc[4:6,[0,1,2,3,18,20]].copy()
# The two datasets can be combined with a column order like the original dataset
df_combined = poly.dataset_merge(df_short1, df_short2 , order = df_in.columns, ignore_index = True)
print('---- df_short1 ----')
display(df_short1)
print('---- df_short2 ----')
display(df_short2)
---- df_short1 ----
| Identifier | Name | UID | Project | Classification | Mz | I10 | Unsat_1M_C | Formulation_FileName | |
|---|---|---|---|---|---|---|---|---|---|
| 2 | IS3 | Inventive_Sample_3 | 0BHE | US20130046061 | Inventive | 159600.0 | 11.5 | 78.0 | NaN |
| 3 | IS4 | Inventive_Sample_4 | 0BLI | US20130046061 | Inventive | 216572.0 | 8.0 | 73.0 | NaN |
---- df_short2 ----
| Identifier | Name | UID | Project | Unsat_1M_C | Formulation_FileName | |
|---|---|---|---|---|---|---|
| 4 | CS1 | Comparative_Sample_1 | 0D2Y | US20130046061 | 145.0 | NaN |
| 5 | CS2 | Comparative_Sample_2 | 0EYJ | US20130046061 | 314.0 | NaN |
To the combined dataset, one more polymer object, stored as json string, is added.
f = open('Article/LDPE.json', "r")
K = json.loads(f.read())
print('The json string is \n', json.dumps(K, indent = 4))
f.close()
The json string is
[
{
"Identifier": "LDPE_133A",
"Name": "LDPE_133A",
"UID": "2IXA",
"Type": "Resin_Commercial",
"Density": 0.923,
"I2": 0.25,
"MFR_180C": [
{
"I2": 0.25,
"I10": null
}
]
}
]
The json file is read using FileImport module and is merged with the df_combined using the dataset_merge module.
df_singlePE = FileImport.JsonImport('Article/LDPE.json')
df_singlePE.index = [999]
df_extended = poly.dataset_merge(df_combined, df_singlePE , order = df_in.columns, ignore_index = True)
print('---- df_combined ----')
display(df_combined)
print('---- df_singlePE ----')
display(df_singlePE)
print('---- df_extended ----')
display(df_extended)
---- df_combined ----
| Identifier | Name | UID | Project | Classification | Mz | I10 | Unsat_1M_C | |
|---|---|---|---|---|---|---|---|---|
| 0 | IS3 | Inventive_Sample_3 | 0BHE | US20130046061 | Inventive | 159600.0 | 11.5 | 78.0 |
| 1 | IS4 | Inventive_Sample_4 | 0BLI | US20130046061 | Inventive | 216572.0 | 8.0 | 73.0 |
| 2 | CS1 | Comparative_Sample_1 | 0D2Y | US20130046061 | NaN | NaN | NaN | 145.0 |
| 3 | CS2 | Comparative_Sample_2 | 0EYJ | US20130046061 | NaN | NaN | NaN | 314.0 |
---- df_singlePE ----
| Identifier | Name | UID | Type | Density | I2 | MFR_180C | |
|---|---|---|---|---|---|---|---|
| 999 | LDPE_133A | LDPE_133A | 2IXA | Resin_Commercial | 0.923 | 0.25 | [{'I2': 0.25, 'I10': None}] |
---- df_extended ----
| Identifier | Name | UID | Project | Classification | Type | Density | Mz | I2 | I10 | Unsat_1M_C | MFR_180C | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | IS3 | Inventive_Sample_3 | 0BHE | US20130046061 | Inventive | NaN | NaN | 159600.0 | NaN | 11.5 | 78.0 | NaN |
| 1 | IS4 | Inventive_Sample_4 | 0BLI | US20130046061 | Inventive | NaN | NaN | 216572.0 | NaN | 8.0 | 73.0 | NaN |
| 2 | CS1 | Comparative_Sample_1 | 0D2Y | US20130046061 | NaN | NaN | NaN | NaN | NaN | NaN | 145.0 | NaN |
| 3 | CS2 | Comparative_Sample_2 | 0EYJ | US20130046061 | NaN | NaN | NaN | NaN | NaN | NaN | 314.0 | NaN |
| 4 | LDPE_133A | LDPE_133A | 2IXA | NaN | NaN | Resin_Commercial | 0.923 | NaN | 0.25 | NaN | NaN | [{'I2': 0.25, 'I10': None}] |
The class Polymer in Polymetrics allows users to condition the experimental raw data for feature development. Class Polymer handles individual polymer objects at the backend and processes experimental data measured using different analytical techniques to develop a single feature or set of features.
In the script below, CEF data is read sequentially by looping through the polymer objects. The data is interpolated and plotted below. The original data is plotted as a scatterplot and interpolated data is plotted as a line plot. One of the polymer objects (CS4) does not have a corresponding CEF file resulting in AttributeError. The try/except method in Python acknowledges the error and continues the execution of the code.
#df_pat is created by selecting the patent data and filtering only the developmental resins.
df_pat = df_in[(df_in['Project'] == 'US20130046061') & (df_in['Type'] == 'Resin_Developmental')]
fig = go.Figure() # Initiating an empty plotly figure
for polymer_ in df_pat.to_dict(orient="records"):
PE = poly.Polymer(polymer_)
try:
original_data, interpolated_data = PE.CEF_interpolate(minT = 30.0, maxT = 110.0, BaselineCorrection = True, Smoothening = False, Plot = False)
fig.add_trace(go.Scatter(x=original_data['T'], # Adding traces sequentially to 'fig' within a loop
y=original_data['Signal'],
name=polymer_['Identifier'],
showlegend = False,
mode='markers'))
fig.add_trace(go.Scatter(x=interpolated_data['T_interpolated'], # Adding traces sequentially to 'fig'
y=interpolated_data['Signal_interpolated'],
name=polymer_['Identifier'],
line_shape='linear'))
except Exception as error:
print('Polymetrics Error at', polymer_['Identifier'], repr(error))
#traceback.print_exc()
pass
fig.update_layout(title='CEF plots in US2013/0046061', xaxis_title='T deg C', yaxis_title='dWf/dT')
fig.show()
Polymetrics Error at CS4 AttributeError("'Polymer' object has no attribute 'df_CEF'")
The polymer articles such as films, sheets, rigid, and pipes are produced by combining different polymers. Although the treatment should apply to all such products, it is demonstrated here in the context of films. The film formulations are expected to be multidimensional arrays - composed of several coextrusion layers and each layer made of more than one polymer.
In the example below:
The blend treatment is demonstrated on Inventive Blend 1 sample. Inventive Blend 1 is a monolayer film produced with a 90/10 blend of IS4 and LDPE.
df_film = df_in[(df_in['Project'] == 'US20130046061') & (df_in['Type'] == 'film')]
print(df_film.loc[17,:]['Name'])
display(pd.read_csv(df_film.loc[17,:]['Formulation_FileName']))
Inventive_Blend_1
| IS4 | LDPE_133A | Layer_Ratio | |
|---|---|---|---|
| 0 | 0.9 | 0.1 | 100 |
The product_composition calculates the average fractional composition of the article. Optionally, the function looks up components of the article in the dataset and returns values of individual components from the specified columns.
PE = poly.Polymer(df_film.loc[17,:])
J = PE.product_composition(df_in, ['Density', 'I2'])
display(J)
| Component | Fraction | Density | I2 | |
|---|---|---|---|---|
| 0 | IS4 | 0.9 | 0.916 | 1.0 |
| 1 | LDPE_133A | 0.1 | 0.921 | 0.2 |
The average density of the blend is calculated by weighted average densities of individual components. The average MI of the blend is approximated by the weighted geometric mean formula.
blend_density = (J['Fraction']*J['Density']).sum()
blend_I2 = np.exp((J['Fraction']*np.log(J['I2'])).sum())
print('Density from Weighted average {0:.3f} g/cc'.format(blend_density))
print('I2 from weighted geometric mean {0:.2f} g/10 min'.format(blend_I2))
Density from Weighted average 0.917 g/cc I2 from weighted geometric mean 0.85 g/10 min